Search CORE

1,140 research outputs found

A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

Author: Bartlett Peter L.
Gabillon Victor
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of optimizing a function under a \emph{budgeted number of evaluations}. We only assume that the function is \emph{locally} smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of \emph{noise}

b

of the function evaluation and 2) the local smoothness,

d

, of the function. A smaller

d

results in smaller optimization error. We come with a new, simple, and parameter-free approach. First, for all values of

b

and

d

, this approach recovers at least the state-of-the-art regret guarantees. Second, our approach additionally obtains these results while being \textit{agnostic} to the values of both

b

and

d

. This leads to the first algorithm that naturally adapts to an \textit{unknown} range of noise

b

and leads to significant improvements in a moderate and low-noise regime. Third, our approach also obtains a remarkable improvement over the state-of-the-art SOO algorithm when the noise is very low which includes the case of optimization under deterministic feedback (

b=0

). There, under our minimal local smoothness assumption, this improvement is of exponential magnitude and holds for a class of functions that covers the vast majority of functions that practitioners optimize (

d=0

). We show that our algorithmic improvement is borne out in experiments as we empirically show faster convergence on common benchmarks

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Local Rademacher complexities

Author: Bartlett Peter L.
Bousquet Olivier
Mendelson Shahar
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.Comment: Published at http://dx.doi.org/10.1214/009053605000000282 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

The Australian National University

MPG.PuRe

Bounding Embeddings of VC Classes into Maximum Classes

Author: Benjamin I. P. Rubinstein
Benjamin I. P. Rubinstein
J. Hyam Rubinstein
J. Hyam Rubinstein
Peter L. Bartlett
Peter L. Bartlett
Publication venue
Publication date: 28/01/2014
Field of study

One of the earliest conjectures in computational learning theory-the Sample Compression conjecture-asserts that concept classes (equivalently set systems) admit compression schemes of size linear in their VC dimension. To-date this statement is known to be true for maximum classes---those that possess maximum cardinality for their VC dimension. The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes. We show that maximum classes can be characterised by a local-connectivity property of the graph obtained by viewing the class as a cubical complex. This geometric characterisation of maximum VC classes is applied to prove a negative embedding result which demonstrates VC-d classes that cannot be embedded in any maximum class of VC dimension lower than 2d. On the other hand, we show that every VC-d class C embeds in a VC-(d+D) maximum class where D is the deficiency of C, i.e., the difference between the cardinalities of a maximum VC-d class and of C. For VC-2 classes in binary n-cubes for 4 <= n <= 6, we give best possible results on embedding into maximum classes. For some special classes of Boolean functions, relationships with maximum classes are investigated. Finally we give a general recursive procedure for embedding VC-d classes into VC-(d+k) maximum classes for smallest k.Comment: 22 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

University of Melbourne Institutional Repository

Linear Programming for Large-Scale Markov Decision Problems

Author: Abbasi-Yadkori Yasin
Bartlett Peter L.
Malek Alan
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost. Since it is intractable to compete with the optimal policy for large scale problems, we pursue the more modest goal of competing with a low-dimensional family of policies. We use the dual linear programming formulation of the MDP average cost problem, in which the variable is a stationary distribution over state-action pairs, and we consider a neighborhood of a low-dimensional subset of the set of stationary distributions (defined in terms of state-action features) as the comparison class. We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. In both cases, we give bounds that show that the performance of our algorithms approaches the best achievable by any policy in the comparison class. Most importantly, these results depend on the size of the comparison class, but not on the size of the state space. Preliminary experiments show the effectiveness of the proposed algorithms in a queuing application.Comment: 27 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Queensland University of Technology ePrints Archive